knitr::opts_chunk$set(echo = FALSE, message = FALSE)
library(Seurat)
library(ggplot2)
library(data.table)
library(MAST)
library(SingleR)
library(dplyr)
library(tidyr)
library(limma)
library(scRNAseq)## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.
Reference Datasets * ImmGen Data: normalized expression values of 830 microarray samples of pure mouse immune cells, generated by the Immunologic Genome Project * Mouse RNA Seq Data: normalized epxression values of 358 bulk RNA-seq samples of sorted cell populations found on GEO
Cell type specific marker gene expression. Genes were added to the list in two different ways: canonical markers that are well known in the field, and genes that distinguished clusters and were found to play a key role in specific cells.
Ighd: immunoglobulin heavy constant delta. Seems to clearly be expressed by B-cells, but still working on a good reference.
Gata2: From Krause paper: a transcription factor required for both lineages but bind in different combinations ref
Cd68: a human macrophage marker ref. A more general ref
Vcam1: found papers using Vcam1+ monocytes, but haven’t found a great reference.
Alas2: an erythroid-specfiic 5-aminolevulinate synthase gene ref
Gata3: plays a role in the regulation of T-cells ref
Vwf and Itga2b: Markers for megakaryocytes
Mcpt8 and Prss34: mast cell proteases
Alt Text
Reading through this paper it states that in mice all long-term HSCs are Hoxb5+
Other markers for HSPCs: Kit, Flt3 (Negative), Ly6a, Cd34, Slamf1.
We saw in the SingleR results that CMPs had the highest correlation with stem cells. From the above figure we can see that CMPs show a distinct pattern of cell surface markers: Kit+Sca1-/lowCd34+FcgRlow
Conclusion: the identification of CMPs seems pretty spot on
Comparing Mpl to Migr1
## [1] "Top 6 Up-Regulated DE Genes"
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Slpi 2.724286e-23 2.864115 0.964 0.739 4.933955e-19
## Akr1c18 2.491254e-15 2.638171 0.835 0.043 4.511911e-11
## Ccl4 6.207123e-12 2.556199 0.911 0.391 1.124172e-07
## Furin 9.882977e-31 1.838023 0.989 0.783 1.789906e-26
## Cfp 5.379440e-16 1.594472 0.940 0.348 9.742703e-12
## Ccl6 3.882870e-21 1.450810 0.978 0.739 7.032266e-17
## [1] "Top 6 Down-Regulated DE Genes"
## p_val avg_logFC pct.1 pct.2 p_val_adj
## Hmgb2 6.395122e-11 -1.211850 0.927 1.000 1.158220e-06
## Lmo4 1.686466e-26 -1.281644 0.377 0.739 3.054359e-22
## Nedd4 4.065083e-27 -1.368005 0.788 1.000 7.362271e-23
## Csrp3 3.630654e-32 -1.604731 0.287 0.913 6.575477e-28
## Mpo 1.815986e-17 -1.634382 0.314 0.478 3.288933e-13
## S100a9 9.667419e-07 -1.976548 0.998 0.957 1.750866e-02
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
##
## 0 1 2 3 4 5 6
## 190 166 85 74 36 27 14
##
## Migr1 Mpl Wildtype
## 0 8 180 2
## 1 0 166 0
## 2 2 83 0
## 3 10 48 16
## 4 0 36 0
## 5 0 27 0
## 6 3 11 0
Seems like wide expression in all the clusters, with the exception of relatively low expression of Prss34 in subcluster 5 (all Mpl)
From the cell surface marker diagram shown earlier MEPs would follow this trend Kit+Ly6a-Cd34-Fcgr2b-
No specific subclustering showing that pattern. Thought once again we see subcluster 5 has the greatest Kit expression.
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: Ahspp
### MEP/ERP
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: C3orf58
## Warning in FetchData(object = object, vars = features, cells = cells): The
## following requested variables were not found: C6orf25